Analyze jobs that start first

Variables with the prefixes ARBLONN_ARB_ and ARBLONN_LONN_ contain information related to all employment relationships registered through the A scheme. These data have job/work relationship as unit level, and not person. And individuals can in principle have more than one job at any given time. In other words, there will be more observations than the number of individuals at any given time in the dataset.

When you want to create statistics or analysis of jobs on individual level, you are often interested in information relating to a selected type of working relationship per individual, e.g. the main employment relationship, the job with the highest vacancy rate, the job with the highest agreed working hours or the job with the highest monthly salary.

The example below shows how to proceed to analyze jobs that start first per individual.

 require no.ssb.fdb:31 as db

//Create a job dataset of active jobs per 16/7 2023, and find the job that starts first per individual
create-dataset job_data_first
import db/ARBLONN_ARB_YRKE_STYRK08 2023-07-16 as occupation
import db/ARBLONN_ARB_START 2023-07-16 as job_start
import db/ARBLONN_ARB_STILLINGSPST 2023-07-16 as position_pct
import db/ARBLONN_ARB_HOVEDARBEID 2023-07-16 as main_job
import db/ARBLONN_ARB_ANSETTELSESFORM 2023-07-16 as employment_form
import db/ARBLONN_ARB_ARBEIDSTID 2023-07-16 as working_hours
import db/ARBEIDSFORHOLD_PERSON as personid

textblock
Position percentage for all active jobs as of 16/7 2023 in the job dataset:
endblock
summarize position_pct
tabulate main_job
tabulate main_job, summarize(position_pct)

//Make a copy of the job dataset before it is aggregated
clone-dataset job_data_first job_data

//Aggregate the job dataset to the individual level, with information about the start time of the first job
collapse(min) job_start -> first_job_start, by(personid)
textblock
Times for the first job start per individual. The date format is UnixTime (number of days measured from 1/1 1970): 
endblock
summarize first_job_start

//Link information about the date of the first job start to the complete job dataset
merge first_job_start into job_data on personid

//Use the information to remove jobs in the job dataset that do not start first
use job_data
keep if job_start == first_job_start
textblock
Times for the first job start per individual. The date format is UnixTime (number of days measured from 1/1 1970).

Note that the number of jobs increases when selecting the first job start in the job dataset. This is due to the occurrence of duplicates since it is possible to have two or more jobs that start first. But the extent of these cases is not large:
endblock
summarize job_start

//Aggregate job data up to individual level, and link personal data to create personal statistics
collapse(mean) position_pct working_hours, by(personid)

create-dataset persons
import db/BEFOLKNING_KJOENN as gender
merge gender into job_data

use job_data
textblock
Position percentage and agreed working time for jobs that start first, divided by gender:
endblock
tabulate gender, missing
tabulate gender, summarize(position_pct, working_hours)